Syntactic Tagging: Procedure for the Transition from the Analytic to the Tectogrammatical Tree Structures

نویسندگان

  • Alena Böhmová
  • Jarmila Panevová
  • Petr Sgall
چکیده

The syntactic tagging of the Prague Dependency Treebank (PDT) is divide into two steps, the rst resulting in analytic tree structures (ATS) and the second in tectogrammatical tree structures (TGTS). The present paper describes the transition procedures, automatic and manual, from ATS to TGTS and illustrates these procedures on two Czech sentences. Syntactic tagging in The Prague Dependency Treebank Project is conceived of in two steps: (i) analytic tree structures (ATS), in which every word form and punctuation mark is explicitly represented as a node of a rooted tree, with no additional nodes added (except for the root of the tree of every sentence) and with edges of the tree corresponding to (surface) syntactic dependency relations, (ii) tectogrammatical tree structures (TGTS) corresponding to the underlying sentence representations; TGTSs have the shape of dependency trees with the verb as the root of the tree and its daughter nodes representing nodes depending on the governor (on each layer of the tree). The two dimensions of the tree represent the syntactic structure of the sentence (the vertical dimension) and the topic-focus articulation of the sentence, based on the underlying word order (the horizontal dimension). In contrast to the ATSs, functional words (such as prepositions, auxiliaries, subordinating conjunctions etc.) as well as punctuation marks principally are not represented by nodes of their own; their functions are captured as parts of the labels (tags) of the nodes standing for autosemantic words. For technical reasons, the coordinating conjuntions are represented as speci c nodes, which have the positions of the head nodes of coordinated constructions. The transition from the ATSs to the TGTSs is conceived of as a transduction procedure (see [1]), consisting of two phases: (A) an automatic 'pre-processing' module, and (B) a manual tagging with the help of a 'user-friendly' software. We want to illustrate here the automatic module, the input of which are the ATSs (with the accessibility of both the morphological and the analytical syntactic tags). The task of the module is then to process the ATSs in view of two aspects: (a) to prune the tree structures, i.e. to devoid them of nodes that are counterparts to auxiliary forms in the surface structure of the sentence, without losing any important pieces of information these auxiliary forms carry; (b) to translate (by means of linguistically substantiated transduction rules) the semantically relevant information given in the ATSs into the terms of the underlying structure. The task under (a) concerns e.g. cancellation of the auxiliary node for the sentence and other "technical" nodes, transduction of the nodes standing for the nal sentence boundary to the modality grammatemes with the governing verb, putting analytical forms together (and placing them in the position of the 'highest' of their parts), and adding the information they convey in the form of indices, grammatemes and other parts of the TGTS complex tags. The part (b) includes rst of all the assignment of 'grammatemes' (i.e. for the values of morphological categories such as number, tense, modality etc.) in those cases in which they can be derived from ATS. The procedure under (b) mainly concerns transduction of the analytic functions (such as Subject, Object, Adverbial, Attribute) into their tectogrammatical counterparts, i.e. Actor, Patient, Addressee, di erent kinds of Free Modi cation using the information on their form and their immediate context (e.g. the information encoded in the prepositions).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prague Dependency Treebank: From analytic to tectogrammatical annotations

The Prague Dependency Treebank is conceived of as an annotated corpus of written Czech, comprising three layers of annotations. In the present paper, we focus on a more detailed description of the structure and contents of the tectogrammatical syntactic trees (underlying sentence representations) and a specification of the transition from the analytic syntactic tree to the tectogrammatical one....

متن کامل

Coreferential Relations In The Prague Dependency Treebank

The approach to corpus annotation of PDT is performed in several levels and steps. The annotation of coreference relations is carried out on underlying (tectogrammatical) tree structures assigned to the sentences in the text on independent (and theoretically based) grounds, which makes it possible to systematically include into the annotation the superficially “null“ (unrealized) anaphors and o...

متن کامل

Prague Dependency Treebank: Restoration of Deletions

The use of the treebank as a resource for linguistic research has led us to look for an annotation scheme representing not only surface syntactic information (in ‘analytic trees’, ATS) but also the underlying syntactic structure of sentences and at least some aspects of intersentential links (in ‘tectogrammatical tree structures’, TGTS). We focus in this paper on some of the issues of the trans...

متن کامل

Automatic Procedures in Tectogrammatical Tagging

A semi-automatic syntactic annotation of a part of the Czech National Corpus in the Prague Dependency Treebank (PDT) has among its aims the possibility to check the theoretical approach chosen (Functional Generative Description, see [2]). While the first phases of the annotation of PDT, i.e. the morphemic representations and the dependency trees on an intermediate analytic level, i.e. analytic ...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999